output head
Accelerating Materials Discovery: Learning a Universal Representation of Chemical Processes for Cross-Domain Property Prediction
Tsitsvero, Mikhail, Nakao, Atsuyuki, Ikebata, Hisaki
Experimental validation of chemical processes is slow and costly, limiting exploration in materials discovery. Machine learning can prioritize promising candidates, but existing data in patents and literature is heterogeneous and difficult to use. We introduce a universal directed-tree process-graph representation that unifies unstructured text, molecular structures, and numeric measurements into a single machine-readable format. To learn from this structured data, we developed a multi-modal graph neural network with a property-conditioned attention mechanism. Trained on approximately 700,000 process graphs from nearly 9,000 diverse documents, our model learns semantically rich embeddings that generalize across domains. When fine-tuned on compact, domain-specific datasets, the pretrained model achieves strong performance, demonstrating that universal process representations learned at scale transfer effectively to specialized prediction tasks with minimal additional data.
- North America > United States (0.46)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Middle East > Israel > Southern District (0.04)
- Asia > China (0.04)
- Research Report (1.00)
- Workflow (0.94)
M$^2$OE$^2$-GL: A Family of Probabilistic Load Forecasters That Scales to Massive Customers
Li, Haoran, Cheng, Zhe, Guo, Muhao, Weng, Yang, Sun, Yannan, Tran, Victor, Chainaranont, John
Probabilistic load forecasting is widely studied and underpins power system planning, operation, and risk-aware decision making. Deep learning forecasters have shown strong ability to capture complex temporal and contextual patterns, achieving substantial accuracy gains. However, at the scale of thousands or even hundreds of thousands of loads in large distribution feeders, a deployment dilemma emerges: training and maintaining one model per customer is computationally and storage intensive, while using a single global model ignores distributional shifts across customer types, locations, and phases. Prior work typically focuses on single-load forecasters, global models across multiple loads, or adaptive/personalized models for relatively small settings, and rarely addresses the combined challenges of heterogeneity and scalability in large feeders. We propose M2OE2-GL, a global-to-local extension of the M2OE2 probabilistic forecaster. We first pretrain a single global M2OE2 base model across all feeder loads, then apply lightweight fine-tuning to derive a compact family of group-specific forecasters. Evaluated on realistic utility data, M2OE2-GL yields substantial error reductions while remaining scalable to very large numbers of loads.
- North America > United States > Texas > Dallas County > Dallas (0.04)
- North America > United States > Arizona (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
LAYA: Layer-wise Attention Aggregation for Interpretable Depth-Aware Neural Networks
Deep neural networks typically rely on the representation produced by their final hidden layer to make predictions, implicitly assuming that this single vector fully captures the semantics encoded across all preceding transformations. However, intermediate layers contain rich and complementary information -- ranging from low-level patterns to high-level abstractions -- that is often discarded when the decision head depends solely on the last representation. This paper revisits the role of the output layer and introduces LAYA (Layer-wise Attention Aggregator), a novel output head that dynamically aggregates internal representations through attention. Instead of projecting only the deepest embedding, LAYA learns input-conditioned attention weights over layer-wise features, yielding an interpretable and architecture-agnostic mechanism for synthesizing predictions. Experiments on vision and language benchmarks show that LAYA consistently matches or improves the performance of standard output heads, with relative gains of up to about one percentage point in accuracy, while providing explicit layer-attribution scores that reveal how different abstraction levels contribute to each decision. Crucially, these interpretability signals emerge directly from the model's computation, without any external post hoc explanations. The code to reproduce LAYA is publicly available at: https://github.com/gvessio/LAYA.
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Learning Conjoint Attentions for Graph Neural Nets Supplementary Materials Tiantian He1,2 Y ew-Soon Ong 1,2 Lu Bai 1,2 1 Agency for Science, Technology and Research (A*ST AR) 2
To prove Theorem 1, we need to consider the two directions of the iff conditions. Obviously, the above equation does not hold as the terms in the summation operator are positive. Eq. (4), we have: null However, the RHS of Eq. (10) can be an RHS is an irrational number, while LHS is a rational number. Eq. (9) can be rewritten as: null To prove Theorem 2, we can follow the procedure which is used to prove Theorem 1. As Eq. (20) holds for any Obviously, the above equation does not hold as softmax function is positive.
- Materials > Chemicals > Industrial Gases > Liquified Gas (1.00)
- Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (1.00)
- Energy > Oil & Gas > Midstream (1.00)
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Liu, Xiaohao, Xia, Xiaobo, Zhao, Weixiang, Zhang, Manyi, Yu, Xianzhi, Su, Xiu, Yang, Shuo, Ng, See-Kiong, Chua, Tat-Seng
Large language models (LLMs) have achieved notable progress. Despite their success, next-token prediction (NTP), the dominant method for LLM training and inference, is constrained in both contextual coverage and inference efficiency due to its inherently sequential process. To overcome these challenges, we propose leap multi-token prediction (L-MTP), an innovative token prediction method that extends the capabilities of multi-token prediction (MTP) by introducing a leap-based mechanism. Unlike conventional MTP, which generates multiple tokens at adjacent positions, L-MTP strategically skips over intermediate tokens, predicting non-sequential ones in a single forward pass. This structured leap not only enhances the model's ability to capture long-range dependencies but also enables a decoding strategy specially optimized for non-sequential leap token generation, effectively accelerating inference. We theoretically demonstrate the benefit of L-MTP in improving inference efficiency.
- North America > United States (0.05)
- Asia > Singapore (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)